Multi-Modal Conversational Search and Browse

نویسندگان

Larry P. Heck

Dilek Z. Hakkani-Tür

Madhu Chinthakunta

Gökhan Tür

Rukmini Iyer

Partha Parthasarathy

Lisa Stifelman

Elizabeth Shriberg

Ashley Fidler

چکیده

In this paper, we create an open-domain conversational system by combining the power of internet browser interfaces with multi-modal inputs and data mined from web search and browser logs. The work focuses on two novel components: (1) dynamic contextual adaptation of speech recognition and understanding models using visual context, and (2) fusion of users’ speech and gesture inputs to understand their intents and associated arguments. The system was evaluated in a living room setup with live test subjects on a real-time implementation of the multimodal dialog system. Users interacted with a television browser using gestures and speech. Gestures were captured by Microsoft Kinect skeleton tracking and speech was recorded by a Kinect microphone array. Results show a 16% error rate reduction (ERR) for contextual ASR adaptation to clickable web page content, and 7-10% ERR when using gestures with speech. Analysis of the results suggest a strategy for selection of multimodal intent when users clearly and persistently indicate pointing intent (e.g., eye gaze), giving a 54.7% ERR over lexical features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Capacitated Single Allocation P-Hub Covering Problem in Multi-modal Network Using Tabu Search

The goals of hub location problems are finding the location of hub facilities and determining the allocation of non-hub nodes to these located hubs. In this work, we discuss the multi-modal single allocation capacitated p-hub covering problem over fully interconnected hub networks. Therefore, we provide a formulation to this end. The purpose of our model is to find the location of hubs and the ...

متن کامل

Using hotspots as a novel method for accessing key events in a large multi-modal corpus

In 2009 we created the D64 corpus, a multi-modal corpus which consists of roughly eight hours of natural, non-directed spontaneous interaction in an informal setting. Five participants feature in the recordings and their conversations were captured by microphones (room, body mounted and head mounted), video cameras and a motion capture system. The large amount of video, audio and motion capture...

متن کامل

"Name That Song!" A Probabilistic Approach to Querying on Music and Text

We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate ...

متن کامل

How Do I Address You? Modelling addressing behaviour based on an analysis of multi-modal corpora of conversational discourse

Addressing is a special kind of referring and thus principles of multi-modal referring expression generation will also be basic for generation of address terms and addressing gestures for conversational agents. Addressing is a special kind of referring because of the different (second person instead of object) role that the referent has in the interaction. Based on an analysis of addressing beh...

متن کامل

Different Approaches to Build Multilingual Conversational Systems

The paper describes developments and results of the work being carried out during the European research project CATCH-2004 (Converse in AThens Cologne and Helsinki). The objective of the project is multi-modal, multi-lingual conversational access to information systems. This paper concentrates on issues of the multilingual telephony-based speech and natural language understanding components.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Multi-Modal Conversational Search and Browse

نویسندگان

چکیده

منابع مشابه

Capacitated Single Allocation P-Hub Covering Problem in Multi-modal Network Using Tabu Search

Using hotspots as a novel method for accessing key events in a large multi-modal corpus

"Name That Song!" A Probabilistic Approach to Querying on Music and Text

How Do I Address You? Modelling addressing behaviour based on an analysis of multi-modal corpora of conversational discourse

Different Approaches to Build Multilingual Conversational Systems

عنوان ژورنال:

اشتراک گذاری